File API Design Evaluation and Latency Budget

Introduction#

Now that the key choices for a file API service have been made, let's discuss how we meet the non-functional requirements and see situations where some tweaks or changes might be needed to make the service more efficient.

Non-functional requirements#

Let's discuss the non-functional requirements identified in the introduction lesson one by one.

Reliability#

We make shadow copies of service components (API gateways, application servers, UFMS, etc.) that can be subject to a single point of failure (SPOF). We also generate multiple copies of the data and store them in regionally distributed data centers as backups. This allows us to respond to any natural disaster. Moreover, we use circuit breakers and appropriate monitoring mechanisms to detect service-critical issues and resolve them in advance.

Security#

Our API allows authorized access to authenticated users only. Users can log in either by authenticating directly with their credentials or by using OAuth 2.0 and OIDC using an authorization code and PKCE flow to obtain a third-party access token. Access tokens reduce the risk of data leakage and loss when dealing with third-party applications. Moreover, the stored data is always encrypted, so the attacker can’t extract any valuable information from it, even if it is compromised. We assume our system also has an appropriate intrusion detection mechanism to identify and recover from bad situations.

Point to Ponder

Question

How do we ensure that uploaded data is not corrupted enroute to the service and also when downloading?

Hide Answer

HTTP uses TCP as the underlying transport protocol, which performs checksums to identify and retransmit data lost over the wire. SSL/TLS also uses a more robust cryptographic hash algorithm (HMAC) rather than a simple TCP checksum. Additionally, we can hash the data using algorithms like MD5 and SHA256 and pass the value along with the data. After successfully uploading or downloading the file, the user can recalculate the hash value using the same algorithm. If the value matches the value sent, the data is not corrupted. Many hash functions on commodity servers are quite fast, but it still incurs an additional computational cost to perform the hash before and after uploading and downloading.

Scalability#

UFMS service decouples metadata from actual file content, allowing us to manage and scale users and data independently. Furthermore, using a storage scheme (capable of handling different structures and schema versions) allows us to efficiently store large amounts of data, along with its encryption details and compression algorithms. Furthermore, we assume that our storage is horizontally scalable, and we can increase storage capacity by increasing the number of back-end storage servers. We strictly follow the REST paradigm, which improves the reliability and stability of our API service in the long run.

Availability#

We use an API gateway to separate public endpoints (client facing) from private endpoints (development endpoints). We impose rate limits on incoming requests and quota limits for third-party consumer applications to reduce server load and prevent DoS attacks. We imply a maximum size limit for file uploads to avoid any service abuse. We also ensure that the most commonly used endpoints have no single point of failure.

Low latency#

We use temporary storage for fast uploads and CDN servers for fast downloads. We use streaming encryption and decryption mechanisms to reduce the delay caused by the encryption and decryption process. We also support data transfers over HTTP/2.0 connections for HTTP/2.0-enabled clients to further reduce latency.

A summary of approaches used to achieve the non-functional requirements is given in the table below:

Achieving Non-Functional Requirements

Non-Functional Requirements

Approaches

Reliability

  • Replication to avoid SPOF
  • Circuit breakers and service monitoring

Security

  • OAuth 2.0 and OIDC for third-party interactions
  • Store encrypted data


Scalability

  • Separate metadata and blob storage
  • Persistently storing the schema information (schema version, type of encryption, etc.) along with data used to read/write the file


Availability

  • Rate limiting incoming requests
  • Limiting the maximum upload size
  • Avoiding SPOF for popular endpoints


Low latency

  • Use of temporary storage for fast uploads
  • Use of CDNs for fast downloads
  • Stream encryption/decryption to improve response time

Latency budget#

Let's get a rough estimate of the response time of our file API for a hypothetical scenario where a professional photographer uploads and downloads a 10 MB raw image.

Note: As discussed in the Back-of-the-envelope Calculations for Latency, the latency of the GET and POST requests are affected by two different parameters. In the case of GET, the average RTT remains the same regardless of the data size due to the small request size, and the time to download the response varies by 0.4 ms per KB. Similarly, for POST requests, the RTTRTT time changes with the data size by 1.15 ms per KB after the base RTT time, which was 260 ms.

Upload request#

Our file API sends a POST request to upload files to the temporary storage at the backend. We assume that this is a standard POST request, and the request body contains file content. Let's calculate the message size of the POST request.

Request and response size#

We know from the “Back-of-the-envelope Calculations for Latency” that a standard POST request is 2 KB, which includes headers and metadata such as fileId, ownerId, authToken, checksum, userList, and so on. By adding the size of the attachment, we get the overall request size by using the following formula:

Request size=Request sizestandard+file size Request\ size=Request\ size_{standard}+file\ size

Request size=2 KB+10240 KB=10242 KB Request\ size=2\ KB+10240\ KB=10242\ KB

We encrypt data using the AES algorithm before storing it in blob storage. For a 10 MB file, let's assume that the processing time of a standard POST request will increase by approximately 6.191 ms.

Point to Ponder

Question

Will file encryption and decryption affect the response time of the API?

Hide Answer

Yes, encrypting and decrypting files before storing and sending them back to the user, respectively, results in increased processing time. This is a tradeoff between the response time and the sensitivity of the stored information. If the information is critical, we need to see how much latency we can tolerate to keep the data safe. Also, the major delay in response time is due to network transfers. Usually, servers process information much faster than the delays caused by network transfers. So, encrypting data with a faster algorithm might not be a big deal for the overall response time of an API.

Response time#

Keeping in mind what we learned above about request and response size, let's use the following calculator to find the estimated response time of the POST request:

Response Time Calculator for Uploading a File

Enter size in KBs10242KB
Minimum latencyf12159.2ms
Maximum latencyf12240.2ms
Processing time10.191ms
Minimum response timef12169.39ms
Maximum response timef12250.39ms

Note: You can enhance your understanding of the latency budget by changing the file size and processing time in the response time calculator above.

Assuming the request size is 10242 KBs:

Timelatency=Timebase+RTTpost+DownloadTime_{latency} = Time_{base} + RTT_{post} + Download

RTTpost=RTTbase+1.15×Size=260 ms+1.15 ms×10 KBsRTT_{post} = RTT_{base}+1.15\times Size = 260\ ms + 1.15\ ms\times 10\ KBs

Timelatency_min=Timebase_min+(RTTbase+1.15×size of request (KBs))+0.4Time_{latency\_min} = Time_{base\_min} + (RTT_{base} + 1.15 \times size\ of\ request\ (KBs)) + 0.4

=120.5+(260+1.15×10242)+0.4=12159.2 ms= 120.5 + (260 + 1.15 \times 10242) + 0.4 = 12159.2\ ms

Timelatency_max=Timebase_max+(RTTbase+1.15×size of request (KBs))+0.4Time_{latency\_max} = Time_{base\_max} + (RTT_{base} + 1.15\times size\ of\ request\ (KBs)) + 0.4

=201.5+260+1.15×10242+0.4=12240.2 ms= 201.5 + 260 + 1.15 \times 10242 + 0.4 = 12240.2\ ms

Similarly, the response time is calculated as:

TimeResponse_min=Timelatency_min+Timeprocessing_minTime_{Response\_min} = Time_{latency\_min}+ Time_{processing\_min}

For processing time,

Timeprocessing_min=Timeprocessing_max=Timeprocessing+TimeEncryption=4+6.191=10.191 msTime_{processing\_min}= Time_{processing\_max} = Time_{processing}+Time_{Encryption}=4+6.191 = 10.191\ ms

TimeResponse_min=12159.2 ms+10.191 ms=12169.39 msTime_{Response\_min} = 12159.2\ ms + 10.191\ ms = 12169.39\ ms

TimeResponse_max=Timelatency_max+Timeprocessing_max=12240.2 ms+10.191 ms=12250.39 msTime_{Response\_max} = Time_{latency\_max}+ Time_{processing\_max}= 12240.2\ ms + 10.191\ ms = 12250.39\ ms

Download request#

Let's continue with the same example and download the file uploaded in the previous section.

Request and response size#

Our file API downloads files from the server using GET requests. Let's assume that the request size for a GET request is 1 KB (standard size). The file returned by the server is of size 10240 KB, as mentioned in the previous section, to calculate the response time for the GET request.

Response size=10240KB Response\ size=10240KB

Response time#

Assuming that decrypting the file takes approximately 6.191 ms, let's add the request size in the calculator below to calculate the response time for downloading a 10 MB file.

Response Time Calculator for Downloading a 10 MB file

Enter size in KBs10242KB
Minimum latencyf4287.3ms
Maximum latencyf4368.3ms
Processing time10.191ms
Minimum response timef4297.49ms
Maximum response timef4378.49ms

Assuming that we are sending a standard GET to download a 10242 KBs file:

Timelatency=Timebase+RTTget+DownloadTime_{latency} = Time_{base} + RTT_{get} + Download

Timelatency_min=Timebase_min+70+0.4×size of request (KBs)Time_{latency\_min} = Time_{base\_min} + 70 + 0.4 \times size\ of\ request\ (KBs)

=120.5+70+0.4×10242=4287.3 ms= 120.5 + 70 + 0.4\times 10242 = 4287.3\ ms

Timelatency_max=Timebase_max+70+0.4×size of request (KBs)Time_{latency\_max} = Time_{base\_max} + 70 + 0.4 \times size\ of\ request\ (KBs)

=201.5+70+0.4×10242=4368.3 ms= 201.5 + 70 + 0.4\times 10242 = 4368.3\ ms

Similarly, the response time is calculated as follows:

TimeResponse_min=Timelatency_min+Timeprocessing_minTime_{Response\_min} = Time_{latency\_min}+ Time_{processing\_min}

For processing time,

Timeprocessing_min=Timeprocessing_max=Timeprocessing+TimeEncryption=4+6.191=10.191 msTime_{processing\_min}= Time_{processing\_max} = Time_{processing}+Time_{Encryption}=4+6.191 = 10.191\ ms

TimeResponse_min=4287.3 ms+10.191 ms=4297.49 msTime_{Response\_min} = 4287.3\ ms + 10.191\ ms = 4297.49\ ms

TimeResponse_max=Timelatency_max+Timeprocessing_max=4368.3 ms+10.191 ms=4378.49 msTime_{Response\_max} = Time_{latency\_max}+ Time_{processing\_max}= 4368.3\ ms + 10.191\ ms = 4378.49\ ms

A summary of the latency budget for upload and download requests is shown in the illustration below:

Detailed workflow of a file API service
Detailed workflow of a file API service

The response time for uploading or downloading a file depends on many factors, such as Internet speed, distance from the server, and file size. Considering that the client and server are located in different locations around the world, we can interpret the numbers above as a good average of the response times.

Optimizations and tradeoffs#

This section will discuss some interesting scenarios or variants of the API and what changes could be brought to optimize our service. Let's dive right in.

Large file upload: Uploading large files (say, 1 GB) takes a long time. It’s pretty likely that the upload will be interrupted, and we may have to reupload the entire file. HTTP/1.1 supports uploading files in byte ranges, which can be assembled after the upload is complete. If the upload is interrupted, we can simply send a HEAD request to know the last chunk received by the server. After that, we can send the next chunk and continue from where we left off.

Resumable file upload
Resumable file upload

Note: We are sending a PUT request to upload the file because we don't want to create a new resource when resuming an interrupted request. Status code 308 Resume Incomplete means the file exists and we can continue uploading the remaining data instead of reuploading from scratch.

Point to Ponder

Question

Is there a way to use HTTP/1.1 to reduce latency for uploading large files?

Hide Answer

Yes, we can reduce the latency of uploading large files by using the HTTP/1.1 range-header. This way, we can split the data into small chunks and transfer them by creating different connections in parallel, reducing the latency significantly. After uploading all the chunks, we reassemble the chunks into one object.

Note: When dealing with chunks of the same object, use the ETag (entity tag) value to identify different parts. Otherwise, we might get a flag of data corruption.

svg viewer

The graph above shows that we can achieve faster uploads by creating multiple TCP connections simultaneously. However, most browsers only allow six concurrent connections to the same host. Also, it depends on the available bandwidth from client to server. So, this hack may not work in all cases.

Multi-file upload: Although HTTP/1.1 supports HTTP pipelining, it can affect API performance when uploading or downloading multiple files, and requests for large files are made first, after which the remaining small files are delayed. This is also called HTTP content-blocking. Although there are workarounds, such as creating multiple connections to get independent responses, these are a waste of resources. Even if we enable chunked data upload for a large file, HTTP/1.1 may underperform in terms of multiplexing compared to HTTP/2.0. For such use cases, HTTP/2.0 may be a better choice.

Note: The performance difference between HTTP/2.0 and HTTP/1.1 is significant when transferring multiple small/large files over a single TCP connection.

Upload notification: We may send notifications to clients when an operation (upload, download, or delete) is complete. We can do this by adding a push notification service to our API workflow that uses the technique described in the Design a Pub-Sub service chapter.

Quiz

Question

What is the preferred HTTP version for APIs that upload large files on unstable networks?

Hide Answer

HTTP/2.0 and earlier versions are connection oriented because the underlying TCP protocols and their performance can be affected by network disconnections. On the other hand, HTTP/3.0 works with QUIC, which takes advantage of the connectionless nature of UDP, making it suitable for unstable networks.

Note: QUIC operates on UPD, which is connectionless, but it is a connection-oriented protocol that applies its own algorithms at the transport layer to create connections.

API Model for File Service

Requirements of the Comment API